Stochastic Gradient Descent
- Only update weights by choosing a specific instance of the batch instead of all of them.
 
- Very noisy
 
- but fast
 
- much faster than batch gradient for machine learning
 
 
- samples have redundancy between them
 
 
 
- Only reason for batching is because hardware is more efficient at batching
 
- Parallelized in a simple way, which is best solved by batching